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Theoretical studies in gravitational wave astronomy have mostly focused on the information that 
can be extracted from individual detections, such as the mass of a binary system and its location 
in space. Here we consider how the information from multiple detections can be used to constrain 
astrophysical population models. This seemingly simple problem is made challenging by the high 
dimensionality and high degree of correlation in the parameter spaces that describe the signals, 
and by the complexity of the astrophysical models, which can also depend on a large number of 
parameters, some of which might not be directly constrained by the observations. We present a 
method for constraining population models using a Hierarchical Bayesian modeling approach which 
simultaneously infers the source parameters and population model and provides the joint probability 
distributions for both. We illustrate this approach by considering the constraints that can be placed 
on population models for galactic white dwarf binaries using a future space based gravitational wave 
detector. We find that a mission that is able to resolve ~ 5000 of the shortest period binaries will 
be able to constrain the population model parameters, including the chirp mass distribution and 
a characteristic galaxy disk radius to within a few percent. This compares favorably to existing 
bounds, where electromagnetic observations of stars in the galaxy constrain disk radii to within 
20%. 



I. INTRODUCTION 

There is an old joke in astrophysics that with one 
source you have a discovery, and with two you have a pop- 
ulation. With a population of sources it becomes possible 
to constrain astrophysical models. Until recently, stud- 
ies of milli-Hertz gravitational wave science have either 
focused on making predictions about the source popula- 
tions, or have looked at detection and parameter estima- 
tion for individual sources. These types of studies have 
featured heavily in the science assessment of alternative 
space-based gravitational wave mission concepts, where 
metrics such as detection numbers and histograms of the 
parameter resolution capabilities for fiducial population 
models were used to rate science performance (see eg. 
Ref. [l[). These are certainly useful metrics, but they 
only tell part of the story. A more powerful and infor- 
mative measure of the science capabilities is the ability 
to discriminate between alternative population models. 

Inferring the underlying population model, and the at- 
tendant astrophysical processes responsible for the ob- 
served source distribution, from the time series of a grav- 
itational wave detector is the central science challenge 
for a future space mission. It folds together the diffi- 
cult task of identifying and disentangling the multiple 
overlapping signals that are in the data, inferring the in- 
dividual source parameters, and reconstructing the true 
population distributions from incomplete and imperfect 
information. 

The past few years have seen the first studies of the 
astrophysical model selection problem in the context of 



space based gravitational astronomy. Gair and collabo- 
rators 043 have looked at how extreme mass ratio inspi- 
ral (EMRI) formation scenarios and massive black hole 
binary assembly scenarios can be constrained by GW ob- 
servations using Bayesian model selection with a Poisson 
likelihood function. Plowman and collaborators [f| 0| 
have performed similar studies of black hole population 
models using a frequentist approach based on error ker- 
nels and the Kolmogorov-Smirnov test. Related work on 
astrophysical model selection for ground based detectors 
can be found in Refs. d,|j|. 

We develop a simple yet comprehensive Hierarchical 
Bayesian modeling approach that uses the full multi- 
dimensional and highly correlated parameter uncertain- 
ties of a collection of signals to constrain the joint param- 
eter distributions of the underlying astrophysical models. 
The method is general and can be applied to any number 
of astrophysical model selection problems [ljj-[l2j 

A remarkable feature of the Hierachial Bayesian 
method is that in its purest form it is completely free 
of selection effects such as Malmquist bias. By "purest 
form" we mean where the signal model extends over the 
entire source population, including those with vanish- 
ingly small signal-to-noise ratio ( 1 31 ] - In practice it is 
unclear how to include arbitrarily weak sources in the 
analysis, and in any case the computational cost would 
be prohibitive, so we are forced to make some kind of se- 
lection cuts on the signals, and this will introduce a bias 



if left uncorrected 14 



To illustrate the Hierachical Bayesian approach and to 
investigate where bias can arise, we look at the problem 
of determining the population model for white dwarf bi- 
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naries in the Milky Way. Future space based missions 
are expected to detect thousands to tens of thousands 
of white dwarf binaries Il5l - [l8j . Here we focus on 
determining the spatial distribution and the chirp mass 
distribution, but in future work we plan to extend our 
study to include a wider class of population character- 
istics such as those described in Ref. [l6j]. Determining 
the galaxy shape using gravitational wave observations of 
white dwarf binaries will be an independent measure on 
the shape of the galaxy to complement electromagnetic 
observations. Additionally, the white dwarf binaries that 
are not detectable form a very bright stochastic fore- 
ground. Accurately modeling the confusion foreground 
level is crucial for the detection of extragalactic stochas- 
tic gravitational wave signals . 

The paper is organized as follows: The Hierarchical 
Bayesian approach is described in § [ill an d is illustrated 
using a simple toy model in § IIIII A more realistic toy 
model is developed in § IIVI to explore mis-modeling bi- 
ases that can occur when using Gaussian approximations 
to the likelihood function. In §|V]the method is applied to 
simulated observations of galactic white dwarf binaries, 
and in § IVI Al the possibility of using the Fisher Informa- 
tion Matrix approximation to the likelihood is explored. 
Concluding thoughts follow in § IVIII 



II. HIERARCHICAL BAYESIAN MODELING 

Hierarchical Bayesian modeling has been around since 
at least the 1950's pp| - [23| . but it is only now becom- 
ing widely known and used. The term "hierarchical" 
arises because the analysis has two levels. At the highest 
level are the space of models being considered, and at the 
lower level arc the parameters of the models themselves. 
Hierachical Baycs provides a method to simultaneously 
perform model selection and parameter estimation. In 
this work we will consider models of fixed dimension 
that can be parameterized by smooth functions of one or 
more hyper-parameters. The joint posterior distribution 
for the model parameters A and the hyper-parameters a 
given data s follows from Bayes' theorem: 



p(X,a\s) 



p(s\X,a)p(X\d)p(d) 
p(s) 



(1) 



where p(s\X,d) is the likelihood, p(X\a) is the prior on 
the model parameters for a model described by hyper- 
parameters a, p{d) is the hyper-prior and p(s) is a nor- 
malizing factor 



p(s,d)dd — / p(s\ A, d)p{x\d)p{d)dxdd . (2) 
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The quantity p{s,a) can be interpreted as the "density 
of evidence" for a model with hyper-parameters a. 

The integral marginalizing over the hyper-parameters 
is often only tractable numerically, and this can be com- 
putationally expensive. Empirical Bayes is a collection 



of methods that seek to estimate the hyper-parameters 
in various ways from the data 0, [Hj]. Markov chain 
Monte Carlo (MCMC) techniques allow us to implement 
Hierachical Bayesian modeling without approximation by 
producing samples from the joint posterior distributions, 
which simultaneously informs us about the model param- 
eters A and the hypcr-paramctcrs a. This approach helps 
reduce systematic errors due to mis-modeling, as the data 
helps select the appropriate model. An example of this 
is the use of hyper-parameters in the instrument noise 
model, such that the noise spectral density is treated as 
an unknown to be determined from the data 0, [26|, [27[ ■ 
Hierarchical Bayesian modeling can be extended to dis- 
crete and even disjoint model spaces using the Reverse 
Jump Markov Chain Monte Carlo (RJMCMC) [H] algo- 
rithm. Each discrete models can be assigned its own set 
of continuous hyper-parameters. 



III. TOY MODEL I 

As a simple illustration of hierarchical Bayesian mod- 
eling, consider some population of N signals, each de- 
scribed by a single parameter Xi that is drawn from a 
normal distribution with standard deviation ao- The 
measured values of these parameters are affected by in- 
strument noise that is drawn from a normal distribu- 
tion with standard deviation f3. The maximum likelihood 
value for the parameters is then Xi — ao<5i + /3<$2 where 
the (5's are i.i.d. unit standard deviates. Now suppose 
that we employ a population model where the parame- 
ters are distributed according to a normal distribution 
with standard deviation a. Each choice of a corresponds 
to a particular model with posterior distribution 



N 



p(xi\s, a) = 1 TT 1 

p[s, a) (2nap) 



and model evidence 



e -{x i -x i y/2^ e -xi/2c 



p(s,a) 



-x 2 /2(a 2 +/3 2 ) 



(3) 



(4) 



To arrive at a Hierarchical Bayesian model we elevate a 
to a hyper-parameter and introduce a hyper-prior p(a) 
which yields the joint posterior distribution 



p(xi,a\s) 



p{xi\s,a)p(a) 
p(s) 



(5) 



Rather than selecting a single "best fit" model, Hierar- 
chical Bayesian methods reveal the range of models that 
are consistent with the data. In the more familiar, non- 
hierarchical approach we would maximize the model evi- 
dence (j4]) to find the model that best describes the data, 
which is here given by 
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Since Var(cci) = ag + /3 2 , we have 

a 2 ME = al±O(V2(a 2 + (3 2 )/VN). (7) 

The error estimate comes from the sample variance of 
the variance estimate. In the limit that the experimental 
errors /3 are small compared to the width of the prior cto, 
the error in a scales uniformly as 1/y/N. The scaling is 
more complicated when we have a collection of observa- 
tions with a range of measurement errors. Suppose that 
the measurement errors are large compared to the width 
of the prior, and that we have N\ observations with stan- 
dard error f3\ , N% observations with standard error eic, 
then the error in the estimate for a is 

Recalling that l//3j scales with the signal-to- noise ratio of 
the observation, we see that a few high SNR observations 
constrain a far more effectively than a large number of 
low SNR observations. 

The above calculation shows that the maximum ev- 
idence criteria provides an unbiased estimator for the 
model parameter ao, but only if the measurement noise is 
consistently included in both the likelihood and the simu- 
lation of the Xi . Using the likelihood from Q but failing 
to include the noise in the simulations leads to the biased 
estimate a 2 AE = a 2 , — 1 . Conversely, including noise in 
the simulation and failing to account for it in the likeli- 
hood leads to the biased estimate a 2 AE = a§ + (3 2 . These 
same conclusions apply to the Hierarchical Bayesian ap- 
proach, as we shall sec shortly. 

A. Numerical Simulation 
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FIG. 1: The Marginalized Posterior Distribution Function for 
a. The injected value is indicated by the vertical black line. 
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FIG. 2: MAP values for 30 different simulations of the toy 
model. The red curve includes noise in the simulated signal 
and converges to ao as expected. The blue curves does not 
include noise in the simulation and converges to oiq — /3 2 



The joint posterior distribution can be explored 
using MCMC techniques. To do this we produced sim- 
ulated data with N = 1000, a = 2 and (3 = 0.4 and 
adopted a flat hyper-prior for a. The posterior distribu- 
tion function for a, marginalized over the Xi, is shown 
in Figure[TJ The distribution includes the injected value, 
and has a spread consistent with the error estimate of 
([7]). The Maximum-a- Posteriori (MAP) estimate for a 
has been displaced from the injected value of cto — 2 by 
the simulated noise. 

To test that there is no bias in the recovery of the 
model hyper-parameter a, we produced 30 different re- 
alizations of the data and computed the average MAP 
value. Figure [5] shows the MAP value for each of these 
realizations and the corresponding average. We see that 
as we average over multiple realizations a does indeed 
converge to the injected value. The blue line in Fig. [5] 
shows a biased recovery for a when noise is not included 
in the data. We instead recover a = ^a 2 ,— f3 2 w 1.96. 



IV. TOY MODEL II 

The Hierarchical Bayesian approach produces un- 
biased estimates for the model parameters if the signal 
and the noise (and hence the likelihood) are correctly 
modeled. However, in some situations the cost of com- 
puting the likelihood can be prohibitive, and it becomes 
desirable to use approximations to the likelihood, such 
as the Fisher Information Matrix. For example, to inves- 
tigate how the design of a detector influences its ability 
to discriminate between different astrophysical models, 
it is necessary to Monte Carlo the analysis over many 
realizations of the source population for many different 
instrument designs, which can be very costly using the 
full likelihood. 

To explore these issues we introduce a new toy model 
that more closely resembles the likelihood functions en- 
countered in gravitational wave data analysis. Consider a 
waveform h that represents a single data point (e.g. the 
amplitude of a wavelet or a Fourier component), which 
can be parameterized in terms of the distance to the 



4 



source do- The instrument noise n is assumed to be Gaus- 
sian with variance 1 . Here we will treat the noise level 
P as a hyper-parameter to be determined from the ob- 
servations. Adopting a fiducial noise level /?o allows us 
to define a reference signal-to-noisc ratio SNRq = hi/ ft. 
The likelihood of observing data s = ho + n for a source 
at distance d with noise level j3 is then 

p{s \d,P) = ^-e-^ 2 /^ (9) 

V Z7T p 

where h = (d /d)h . The likelihood is normally dis- 
tributed in the inverse distance l/d, with a maximum 
that depends on the particular noise realization n: 

1 = l + n/(/? SNR ) 
dML d 

Now suppose that the distances follow a one-sided normal 
distribution p(d > 0) = exp(— d 2 /2al), and that we 

adopt a corresponding model for the distance distribution 
with hyper-parameter a and a flat hyper-prior. 

We simulate the data from TV = 1000 sources with 
a = 2 and /? = 0.05. The values of «o and /3 were 
chosen to give a fiducial SNR = 5 for d = 2ao- In the 
first of our simulations the value of (3 was assumed to be 
known and we computed the MAP estimates of a for 30 
different simulated data sets. As shown in Figure [3l the 
average MAP estimate for a converges to the injected 
value. 
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FIG. 3: MAP values for 30 different realizations of the toy 
model II. Using the full likelihood (blue) the MAP values 
converge to the injected value, but with the Fisher Matrix 
approximation to the likelihood (red) there is a bias. 

In contrast to the first toy model where only the com- 
bination a 2 + /3 2 is constrained by the data, in this more 
realistic toy model both the noise level ft and the model 
hyper-parameter a are separately constrained. Figure @] 
shows the marginalized PDFs for both (3 and a. Tests us- 
ing multiple realizations of the data show that the MAP 
values of a and /3 are un-biased estimators of the injected 
parameter values. 




FIG. 4: PDFs for the prior hyper-parameter a and the noise 
level p for toy model II. Both are individually constrained in 
this model. The injected values are shown by the black lines. 

A. Approximating the Likelihood 

For stationary and Gaussian instrument noise the log 
likelihood for a signal described by parameters A is given 
by 

L{X) = -\{s-h{X)\s-h{X)) (11) 

where (a\b) denotes the standard noise- weighted inner 
product, and we have supressed terms that depend on the 
noise hyper-parameters. We can expand the waveform 
h(X) about the injected source parameters Ao: 

h(X) = h(X ) + AA'A, + AA'AA '//.,, + 0(AA 3 ) (12) 

where A A = A — Ao, and it is understood that the deriva- 
tives are evaluated at Ao- Expanding the log likelihood 
we find: 

L(AA) = - -{n\n) + AX^nlh.i) 

- ^AX'AX (h^h^) + 0(AX 3 ) . (13) 

The maximum likelihood solution is found from 
dL/dAX 1 = 0, which yields AA^l = (n\hj)T ij , where 
r y is the inverse of the Fisher Information Matrix = 
(h t i\hj). Using this solution to eliminate (n\h t i) from 
(fl~3"|) yields the quadratic, Fisher Information Matrix ap- 
proximation to the likelihood: 

L(A) = const. - i(A l - A^)^" - A^)r« . (14) 

This form of the likelihood can be used in simulations by 
drawing the AAJ^ L from a multi-variatc normal distribu- 
tion with covariance matrix T lJ . 

In our toy model T dd = SNR^/?g/(/3 2 ^), and L{d) = 
-SNR^/3g(d - c? ML ) 2 /(2/3 2 ^). The approximate likeli- 
hood follows a normal distribution in d while the full 
likelihood follows a normal distribution in l/d. For sig- 
nals with large SNR this makes little difference, but at 
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low SNR the difference becomes significant and results 
in a bias in the recovery of the model hyper-parameters, 
as shown in Figure |3] In this instance there is a simple 
remedy: using u = l/d in place of d in the quadratic ap- 
proximation to the likelihood exactly reproduces the full 
likelihood in this simple toy model. However, it is not al- 
ways so easy to correct the deficiencies of the quadratic, 
Fisher Information Matrix approximation to the likeli- 
hood. 



V. WHITE DWARF BINARIES IN THE MILKY 

WAY 

To illustrate how the Hierarchical Bayesian approach 
can be applied to an astrophysically relevant problem, we 
investigate how population models for the distribution of 
white dwarf binaries in the Milky Way galaxy can be con- 
strained by data from a space based gravitational wave 
detector. Several studies have looked at parameter esti- 
mation for individual white dwarf binaries in the Milky 
Way [29M3]| . We extend these studies to consider how 
the individual observations can be combined to infer the 
spatial and mass distributions of white dwarf binaries in 
the Galaxy. 

We use the Laser Interferometer Space Antenna 
(LISA) [32j as our reference mission. We focus this anal- 
ysis on short-period galactic binaries, with gravitational 
wave frequencies above 4 mHz. Our conclusions would 
be little changed if we considered the recently proposed 
eLISA [I mission instead, as both are able to detect 
roughly the same number of galactic binaries in the fre- 
quency bands considered here. 

The 4 mHz lower limit is chosen to simplify the anal- 
ysis in two ways. Firstly, it avoids the signal overlap 
and source confusion problems that become significant 
at lower frequencies [151 ] . and secondly, it circumvents 
the issue of sample completeness and Malmquist selection 
bias since LISA's coverage of the galaxy is complete at 
high frequencies. This claim is substantiated in Figure [5] 
showing the cumulative percentage of binaries detected 
as a function of frequency for a 4 year LISA mission. A 
given frequency bin represents the percentage of binaries 
of that frequency and higher that are detected. All bi- 
naries above ~ 4 mHz are detectable by LISA, of which 
there are ~ 5000. 

It is possible to extend our analysis to include all de- 
tectable white dwarf binaries if we were to properly ac- 
count for the undetectable sources. One way to do this is 
to convolve the astrophysical model priors by a function 
that accounts for the selection effects [l4[ so that we are 
working with the predicted observed distribution rather 
than the theoretical distribution. Another approach is to 
marginalize over the un-detectable signals |l3j |. 

The high frequency signals are not only the simplest to 
analyze, but they also tend to have the highest signal-to- 
noise ratios, the best sky localization, and the best mass 
and distance determination due to their more pronounced 



evolution in frequency. When simulating the population 
of detectable sources we will assume that binaries of all 
frequencies above 4 mHz are homogeneously distributed 
throughout the galaxy and share the same chirp mass 
distribution. In reality the population is likely to be more 
heterogenous, and more complicated population models 
will have to be used. 




0.006 
Freqoency (Hz) 



FIG. 5: The percentage of sources which are detectable as 
a function of frequency. Virtually 100% of the white dwarf 
binaries in the Milky Way above 4mHz would be detected by 
LISA. 



A. Likelihood 

The likelihood for a single source is given by: 

p(s\X) = Ce-( s - h ^ s - h{ ^^. (15) 

Here p(s\X) is the likelihood that the residual s — h(X) 
is drawn from Gaussian noise, where s is the data, 
and h(X) is the signal produced in the detector by a 
source described by parameters A. The simulated data 
s = h(Xo) + n includes a waveform h(Xo) and a realiza- 
tion of the LISA instrument noise n. The normalization 
constant C depends on the instrument noise levels, but 
is independent of the waveform parameters. 

The waveform for a white dwarf binary is well approx- 
imated by: 

14OM0 2 /1 + cosV 
d ~ 



h+(t) 



cos(Qi) 



, . , 1 AGMn 2 . , n . 
"■x(i) = -; i cos Lsmiilt) 



(16) 



where = 2nf. We have 8 parameters that describe a 
white dwarf binary signal, the frequency /, the distance 
to the source d, the chirp mass M, the inclination angle 1, 
a polarization angle tJj, a phase angle (po, and sky location 
parameters 9 and cf>. To leading order, the frequency 
evolves as: 

/= 96*^)6/3^1/3 . (17) 
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Sources with /T 2 SNR ~ l ; where T is the observation 
time, provided useful measurments of the chirp mass M 
and the distance d [13, [34{ ■ The strong / dependence in 
(|17[) is the reason why high frequency binaries are the 
best candidates for placing strong constraints on the dis- 
tance and chirp mass. 



where Mo, a and b are hyper-parameters in our model. 
C is the normalization constant which can be calculated 
analytically and is given by: 



C = M Q ir- 



a + 1 _ 

b^a 



Tr(b-l) 



(20) 



B. Model for the Galaxy 

We adopt a bulge plus disk model for the galaxy 
shape [3a - l38| . Choosing the x-y plane as the plane of 
the galaxy, the density of stars in the galaxy is given by: 

p(x, y, z) = po (Ae- r2 l R > + (1 - A) e - U / R «sech 2 (z/Z d )) 

(18) 

Here, r 2 = x 2 + y 2 + z 2 , u 2 = x 2 + y 2 , Rb is the charac- 
teristic radius for the bulge, and Rd and Zd are a charac- 
teristic radius and height for the disk respectively. The 
quantity po is a reference density of stars and the co- 
efficient A, which ranges between and 1, weights the 
number of stars in the bulge versus the number in the 
disk. We produced synthetic galaxies using the catalog 
of binaries provided by Gijs Nelemans for the Mock LISA 
Data Challenges (MLDC) [H. 

With appropriate normalization, the spatial density p 
becomes our prior distribution for the spatial distribu- 
tion of galactic binaries. The parameters of the density 
distribution A, Rb, Rd and Zd become hyper-parameters 
in the Hierarchical Bayesian analysis. Each set of values 
for the four parameters corresponds to a distinct model 
for the shape of the galaxy. For our simulations, we chose 
a galaxy with A — 0.25, Rb = 500pc, Rd = 2500pc, and 
Z d = 200pc. 



C. Chirp Mass Prior 



Mo is the mode of the distribution. The hyper- 
parameters a and b determine the width of the distri- 
bution, which can be seen by calculating the full width 
at half maximum (FWHM). It is given by: 



FWHM ~ Mo f[2(6/a+l) 



,1/6 



[2(o/6 + l)] 



-l/a 



(21) 

We further assume that the orbital evolution is due 
only to the emission of gravitational waves, and is thus 
adequately described by (|17[) . In principle one would 
want to be more careful and consider tidal effects and 
mass transfer [4(| as possible contributions to /. How- 
ever, it is expected that the high frequency sources we 
are focusing on will be mostly detached white dwarf bi- 
naries where tidal or mass transfer effects are unlikely to 
be significant fill ]. 



True Chirp Masses ■ 
Chirp Masses at MAP 
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Our ability to measure the hyper-parameters of the 
spatial distribution depends on how well we measure the 
sky location and distance for each binary. For many 
sources, the distance is poorly determined because it is 
highly correlated with the chirp mass. However, there 
are enough binaries that have sufficiently high frequency, 
chirp mass and/or SNR to provide tight constraints on 
the chip mass distribution. The empirically determined 
chirp mass distribution then functions as a prior for the 
lower SNR, less massive, or lower frequency sources, and 
improves their distance determination. 

Figure [6] shows the chirp mass distribution for binaries 
in our simulated galaxy. We use this distribution to con- 
struct a hyper-prior on the chirp mass, approximated by 
the following distribution: 

P(M) = ° - . b (19) 

I M ) i o M \ 
\M ) 6 \M ) 



FIG. 6: The chirp mass distribution of the 5000 binaries used 
in our simulations is shown in red. The green distribution 
shows the MAP values of the recovered chirp mass for each 
binary, and the blue shows the model (|19[) using the MAP val- 
ues for the chirp mass prior hyper-parameters. The brightest 
binaries accurately capture the chirp mass distribution, which 
serves as a useful prior for sources whose chirp masses are not 
so well determined. 



VI. RESULTS 

We are able to efficiently calculate the full likelihood 
for each source (eq. [T5)) using the fast waveform generator 
developed by Cornish and Littenberg [26| . The following 
results are all derived from simulations using the full like- 
lihood. Using the same MCMC approach from our toy 
models, we sample the posterior and get PDFs for source 
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and model parameters simultaneously. We check for con- 
vergence by starting the chains at different locations in 
the prior volume and find that regardless of starting lo- 
cation, the chains converge to the same PDFs. 

Our procedure successfully recovers the correct chirp 
mass distribution, as shown in Figure [5] and is able to 
meaningfully constrain the parameters of the galaxy dis- 
tribution and chirp mass distribution models, with PDFs 
shown in Figure [7] and Figure [5] respectively. 
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FIG. 7: PDFs for the four galaxy model hyper-parameters. 
The red is for a simulation using 100 binaries, the green 1000 
binaries, and the blue 5000 binaries. The black lines show the 
true values of the distribution from which the binaries were 
drawn. 
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FIG. 8: PDFs for the three chirp mass model hyper- 
parameters and the FWHM of the distribution. The red is 
for a simulation using 100 binaries, the green 1000 binaries, 
and the blue 5000 binaries. 

We ran simulations with 100, 1000, and 5000 bina- 
ries to show how the constraints on the galaxy hyper- 
parameters improved as we include more sources (for 
comparison, cLISA is expected to detect between 3500- 
4100 white dwarf binaries during a 2-year mission life- 
time @). The chains ran for 1 million, 500k, and 100k 



iterations respectively. Even for a relatively modest num- 
ber of detections we begin to get meaningful measure- 
ments on the population model of white dwarf binary 
systems. The more binaries we use in our analysis the 
tighter our constraints on the hyper-parameters. 



Parameter 


100 
MAP a 


1000 
MAP a 


5000 
MAP a 


A 

Rb (pc) 
Rd (pc) 
Zd (pc) 


0.262 0.047 
440 58.9 

2465 237.5 
193 20.8 


0.226 0.0157 
490 17.1 
2584 70.2 
201 7.02 


0.249 0.0074 
480 8.38 
2461 32.4 
195 3.25 


Mo 
FWHM 


0.226 0.0063 
0.07 0.0094 


0.208 0.0018 
0.071 0.0026 


0.205 0.00088 
0.076 0.0014 



TABLE I: MAP values and variances for the galaxy hyper- 
parameters when using 100, 1000 and 5000 galactic binaries 
in the analysis. The simulated values were A = 0.25, Rb = 
500pc, R d = 2500pc, and Z d = 200pc. 

Table 1 lists the recovered MAP values and the vari- 
ance of the marginalized posterior distribution function 
for each hyper-paramctcr. Gravitational wave observa- 
tions would be very competitive with existing electro- 
magnetic observations in constraining the shape of the 
galaxy [42j, [43} . Making direct comparisons between our 
results to those in the literature is complicated, as the 
actual values of the bulge and disk radii are very model 
dependent. For example, Juric uses a model where the 
galaxy is comprised of both a thin and thick disk. With 
GW data in hand, this comparison could easily be made 
by trivially substituting the density profile used here. 

What matters for this proof-of-principal study is how 
well the parameters can be constrained. In the models 
of Juric et al constraints for the disk radii are around 
20%. We find similar accuracy when using a pessimistic 
population of 100 systems. Adopting a source catalog 
that is more consistent with theoretical predictions, we 
find constraints for the disk parameters as low as 1.5% - 
a substantial improvement over the state-of-the-art. 



A. Approximating the Likelihood 

While we happen to have a very efficient method for 
computing the full likelihood for galactic binaries, this is 
not always the case. For other signal types the full likeli- 
hood can be very expensive to compute, posing problems 
if we wish to do extensive studies of many astrophysical 
models or detector configurations. For such exploratory 
studies it is preferable to use the Fisher Information Ma- 
trix approximation to the likelihood of (fhl"]). However, 
as we saw with the toy model in mVl this can lead to 
biases in the recovered parameters. The Fisher matrix 
Tij is not a coordinate invariant quantity, and we can at 
least partially correct the bias by reparamcterizing our 
likelihood. Just as in mVl instead of using the distance 
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d as a variable, we can instead use 1/d, which provides 
a much better approximation to the full likelihood. We 
test these short-cuts by redoing the analysis of the galac- 
tic population using the Fisher matrix approximation to 
the likelihood (both with d and 1/d as parameters) and 
compare to the results from the previous analysis using 
the full likelihood. Figure O shows PDFs for the galaxy 
hyper-parameters using the three different methods for 
computing p(d\X) with the full sample of 5000 binaries. 




? 0.21 0.220.230.240.250.260.270.280.29 0.3 
Coefficient A 



2300 2350 2400 2450 2500 2550 2600 2650 2700 
Disk Radius R d 




480 500 520 
Bulge Radius R b 



FIG. 9: PDFs from a simulation using 5000 binaries for the 
four galaxy model hyper-parameters for the full likelihood 
(red), a Fisher approximation in d (green), and a Fisher ap- 
proximation in 1/d (blue). 

We find that the approximation using 1 j d matches the 
full likelihood better than the likelihood parameterized 
with d, however there are additional discrepancies due 
to non-quadratic terms in the sky location {9, </>} that we 
have not accounted for. The dependence of the waveform 
on {9, <f\ is more complicated than the distance, and is 
not so easily corrected by a simple reparameterization. 
The approximation could be improved by carrying the 
expansion of the likelihood beyond second order, however 
this is computationally expensive and can be numerically 
unstable. 

If we analyze several realizations of the galaxy using 
the three different likelihood functions and average the 
results, we find the biases are persistent for the approx- 
imate methods. Figure [TU] shows the MAP values and 
the average of the MAP values for 10 realizations of our 
fiducial galaxy model. The biases in the recovered disk 
radius and disk height arc particularly pronounced when 
using the Fisher Matrix approximation to the likelihood 
parameterized with d. 



VII. CONCLUSION 

Wc have demonstrated a general Hierarchical Bayesian 
method capable of constraining the model parameters 
for a population of sources. In the particular case of 
white dwarf binaries in the Milky Way, we can constrain 




Disk Radius R d 




1 2 3 4 5 6 7 

Bulge Radius R b Disk Height Z d 

FIG. 10: MAP values and corresponding averages from a sim- 
ulations using 5000 binaries for the four galaxy model hyper- 
parameters for the full likelihood (red), a Fisher Matrix ap- 
proximation parameterized with d (green), and a Fisher Ma- 
trix approximation using 1/d (blue). 



the spatial distribution of the galaxy to levels better 
than current electromagnetic observations using the an- 
ticipated number of systems detectable by space-based 
gravitational wave detectors. Even if the currently held 
event rates for white dwarf binaries turn out to be op- 
timistic by more than an order of magnitude, the con- 
straints possible with a gravitational wave detector are 
comparable to our current estimates of the Milky Way's 
shape. 

When the data from a space-borne detector has been 
collected, the resolvable white dwarf binaries will be re- 
gressed from the data, leaving behind a confusion-limited 
foreground which will significantly contribute to the over- 
all power in the data around ~ 1 mHz. Measuring the 
overall shape of the galaxy as demonstrated here will 
provide additional means to characterize the level of the 
confusion noise. As we will show in an upcoming paper, 
we can then use the detailed understanding of the fore- 
ground signal to detect a stochastic gravitational wave 
background at levels well below the confusion noise. 

Analyzing simulated data with the full likelihood is 
computationally taxing and, when performing a large 
suite of such studies, could prove to be prohibitive. 
To mitigate the cost of such analyses, we test a much 
faster approach (approximately 50 times faster), using 
the Fisher matrix approximation to the likelihood. Wc 
find the results are significantly less biased by the Fisher 
approximation when using 1 / d as the parameter that en- 
codes the distance to the source. This simple adjustment 
gives adequately reliable results in significantly less time 
than the brute-force calculation, and will provide an ad- 
ditional, useful, metric to gauge the relative merits of 
proposed space-based gravitational wave missions. 
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